Parallel database sorting
نویسندگان
چکیده
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multi-processors (parallel internal sorting). External sorting on multi-processors (parallel external sorting) has received surprisingly little attention; furthermore, the way current parallel database systems do sorting is far from optimal in many scenarios. In this paper, we present a taxonomy for parallel sorting in parallel database systems, which covers five sorting methods: namely parallel merge-all sort, parallel binary-merge sort, parallel redistribution binary-merge sort, parallel redistribution merge-all sort, and parallel partitioned sort. The first two methods are previously proposed approaches to parallel external sorting which have been adopted as status quo of parallel database sorting, whereas the latter three methods which are based on redistribution and repartitioning are new that have not been discussed in the literature of parallel external sorting. Performance of these five methods is investigated and the results are reported. 2002 Elsevier Science Inc. All rights reserved.
منابع مشابه
Sorting in Parallel Database Systems
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multiprocessors (parallel i...
متن کاملTuning a Parallel Database Algorithm on a Shared-memory Multiprocessor
Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must he considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing s...
متن کاملExternal Sorting for Databases in Distributed Heterogeneous Systems
A common approach to external parallel sorting in parallel database query processing is to split the data of initial runs into partitions. These partitions are assigned statically to the processes of the merge phase to produce a globally sorted result. This strategy may lead to low performance if some processes are overloaded caused by data skew or load imbalances. In this paper we describe a n...
متن کاملA Fast, Storage-E cient Parallel Sorting Algorithm
A parallel sorting algorithm is presented for storage-e cient internal sorting on MIMD machines. The algorithm rst sorts the elements within each node using a serial sorting algorithm, then uses a two-phase parallel merge. The algorithm is comparisonbased and requires additional storage of order the square root of the number of elements in each node. Performance of the algorithm on two general-...
متن کاملParallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting
We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 146 شماره
صفحات -
تاریخ انتشار 2002